Generative Spoken Dialogue Language Modeling
نویسندگان
چکیده
Abstract We introduce dGSLM, the first “textless” model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised unit discovery coupled with a dual-tower transformer architecture cross-attention trained 2000 hours two-channel raw conversational (Fisher dataset) without any text or labels. show that our is speech, laughter, and other paralinguistic signals in two channels simultaneously reproduces more fluid turn taking compared text-based cascaded model.1,2
منابع مشابه
Spoken Language Tutorial Dialogue
An existing natural language speech dialogue system will be integrated with an existing mathematics tutor to provide adaptive instruction for grade school children. The resulting dialogue tutor will use general linguistic knowledge, including a fairly complex model of English and taskoriented dialogues, to speak to the student when he or she has trouble solving a problem. The tutor will initiat...
متن کاملSpoken Language Dialogue Systems
Preface The present report is the first one in a series from the strategic research programme: Spoken Language Dialogue Systems. The programme is sponsored by the Danish Technical Research Council, and runs for a four year (primo 1991-primo 1995) period. The participants are the Speech Technology Centre (STC) The objective of the programme is to develop human-computer spoken dialogue system pro...
متن کاملSequential Dialogue Context Modeling for Spoken Language Understanding
Spoken Language Understanding (SLU) is a key component of goal oriented dialogue systems that would parse user utterances into semantic frame representations. Traditionally SLU does not utilize the dialogue history beyond the previous system turn and contextual ambiguities are resolved by the downstream components. In this paper, we explore novel approaches for modeling dialogue context in a re...
متن کاملMulti-channel sentence classification for spoken dialogue language modeling
In traditional language modeling word prediction is based on the local context (e.g. n-gram). In spoken dialog, language statistics are affected by the multidimensional structure of the human-machine interaction. In this paper we investigate the statistical dependencies of users’ responses with respect to the system’s and user’s channel. The system channel components are the prompts’ text, dial...
متن کاملUser Modeling For Spoken Dialogue
Automatic speech dialogue systems are becoming common. In order to assess their performance, a large sample of real dialogues has to be collected and evaluated. This process is expensive, labor intensive , and prone to errors. To alleviate this situation we propose a user simulation to conduct dialogues with the system under investigation. Using stochastic modeling of real users we can both deb...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Transactions of the Association for Computational Linguistics
سال: 2023
ISSN: ['2307-387X']
DOI: https://doi.org/10.1162/tacl_a_00545